Extending Rule-based Classifiers to Improve Recognition of Imbalanced Classes
نویسندگان
چکیده
This papers deals with inducing rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining classes (majority classes). We discuss reasons for bias of standard classifiers toward recognition of examples from majority classes and misclassifcation of the minority class. To avoid limitations of sequential covering approaches, we present a new approach to improve sensitivity of the rule based classifier. It includes a modification the structure of sets of rules, where for majority classes minimal sets of rules are still induced while the rule set for the minority class is generated by the algorithm, called EXPLORE. This algorithm produces rules being more general and supported by more learning examples than rules from the minimal set. The usefulness of the new approach is verified in a comparative experiments on several imbalanced data sets.
منابع مشابه
ارائهروش جدید مبتنیبر برنامهنویسی ژنتیک برای وزندهی قوانین فازی در طبقهبندی نامتوازن
In classification problems, we often encounter datasets with different percentage of patterns (i.e. classes with a high pattern percentage and classes with a low pattern percentage). These problems are called “classification Problems with imbalanced data-sets”. Fuzzy rule based classification systems are the most popular fuzzy modeling systems used in pattern classification problems. Rule weights...
متن کاملExtending rule based classifiers for dealing with imbalanced data
Many real world applications involve learning from imbalanced data sets, i.e. data where the minority class of primary importance is under-represented in comparison to majority classes. The high imbalance is an important obstacle for many traditional machine learning algorithms as they are biased towards majority classes. It is desired to improve prediction of interesting, minority class exampl...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملImproving Rule-Based Classifiers Induced by MODLEM by Selective Pre-processing of Imbalanced Data
In the paper we discuss inducing rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining classes (majority classes). To improve the ability of a classifier to recognize this class, we propose a new selective pre-processing approach that is applied to data before inducing a rule-based classifier. The approach combines se...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008